SINOD - Slovenian non-native speech database
نویسندگان
چکیده
This paper presents the SINOD database, which is the first Slovenian non-native speech database. It will be used to improve the performance of large vocabulary continuous speech recogniser for non-native speakers. The main quality impact is expected for acoustic models and recogniser’s vocabulary. The SINOD database is designed as supplement to the Slovenian BNSI Broadcast News database. The same BN recommendations were used for both databases. Two interviews with non-native Slovenian speakers were incorporated in the set. Both non-native speakers were female, whereas the journalist was Slovenian native male speaker. The transcription approach applied in the production phase is presented. Different statistics and analyses of database are given in the paper.
منابع مشابه
Acquisition and Annotation of Slovenian Lombard Speech Database
This paper presents the acquisition and annotation of Slovenian Lombard Speech Database, the recording of which started in the year 2008. The database was recorded at the University of Maribor, Slovenia. The goal of this paper is to describe the hardware platform used for the acquisition of speech material, recording scenarios and tools used for the annotation of Slovenian Lombard Speech Databa...
متن کاملObjective analysis of emotional speech for English and Slovenian Interface emotional speech databases
In this paper we propose a new approach for analysis of emotional speech prosody features. The aim of the analysis is definition of emotional features that characterise emotions. Analysis was performed on emotional speech databases that were recorded in the framework of the project "Multimodal Analysis/Synthesis System for Human Interaction to Virtual and Augmented Environments" (Interface). Th...
متن کاملLabeling of Prosodic Events in Slovenian Speech Database GOPOLIS
The paper describes prosodic annotation procedures of the GOPOLIS Slovenian speech data database and methods for automatic classification of different prosodic events. Several statistical parameters concerning duration and loudness of words, syllables and allophones were computed for the Slovenian language, for the first time on such a large amount of speech data. The evaluation of the annotate...
متن کاملBNSI Slovenian broadcast news database - speech and text corpus
This paper presents the BNSI Slovenian Broadcast News database project. The result of the project is a database with speech and text corpus oriented toward large vocabulary continuous speech recognition in general domain. The speech corpus consists of 36 hours of transcribed evening and late night news. The raw database material was captured in the archive of national broadcaster RTV Slovenia t...
متن کاملSpeech Recognition of Slovenian and Croatian Weather Forecasts
In the paper we present some results of a joint project in speech data collection and speech recognition of Slovenian and Croatian weather forecasts. In the paper we describe the procedures we have performed in order to obtain a domain specific speech database from broadcast programmes. Additionally the speech recognition experiments are described and some speech recognition results for the Cro...
متن کامل